Adversarial examples are augmented data points generated by imperceptibleperturbation of input samples. They have recently drawn much attention with themachine learning and data mining community. Being difficult to distinguish fromreal examples, such adversarial examples could change the prediction of many ofthe best learning models including the state-of-the-art deep learning models.Recent attempts have been made to build robust models that take into accountadversarial examples. However, these methods can either lead to performancedrops or lack mathematical motivations. In this paper, we propose a unifiedframework to build robust machine learning models against adversarial examples.More specifically, using the unified framework, we develop a family of gradientregularization methods that effectively penalize the gradient of loss functionw.r.t. inputs. Our proposed framework is appealing in that it offers a unifiedview to deal with adversarial examples. It incorporates anotherrecently-proposed perturbation based approach as a special case. In addition,we present some visual effects that reveals semantic meaning in thoseperturbations, and thus support our regularization method and provide anotherexplanation for generalizability of adversarial examples. By applying thistechnique to Maxout networks, we conduct a series of experiments and achieveencouraging results on two benchmark datasets. In particular,we attain the bestaccuracy on MNIST data (without data augmentation) and competitive performanceon CIFAR-10 data.
展开▼